29 research outputs found

    On the Geographic Location of Internet Resources

    Full text link
    One relatively unexplored question about the Internet's physical structure concerns the geographical location of its components: routers, links and autonomous systems (ASes). We study this question using two large inventories of Internet routers and links, collected by different methods and about two years apart. We first map each router to its geographical location using two different state-of-the-art tools. We then study the relationship between router location and population density; between geographic distance and link density; and between the size and geographic extent of ASes. Our findings are consistent across the two datasets and both mapping methods. First, as expected, router density per person varies widely over different economic regions; however, in economically homogeneous regions, router density shows a strong superlinear relationship to population density. Second, the probability that two routers are directly connected is strongly dependent on distance; our data is consistent with a model in which a majority (up to 75-95%) of link formation is based on geographical distance (as in the Waxman topology generation method). Finally, we find that ASes show high variability in geographic size, which is correlated with other measures of AS size (degree and number of interfaces). Among small to medium ASes, ASes show wide variability in their geographic dispersal; however, all ASes exceeding a certain threshold in size are maximally dispersed geographically. These findings have many implications for the next generation of topology generators, which we envisage as producing router-level graphs annotated with attributes such as link latencies, AS identifiers and geographical locations.National Science Foundation (CCR-9706685, ANI-9986397, ANI-0095988, CAREER ANI-0093296); DARPA; CAID

    Analysis of OD Flows (Raw Data)

    Full text link
    In a recent paper, Structural Analysis of Network Traffic Flows, we analyzed the set of Origin Destination traffic flows from the Sprint-Europe and Abilene backbone networks. This report presents the complete set of results from analyzing data from both networks. The results in this report are specific to the Sprint-1 and Abilene datasets studied in the above paper. The following results are presented here: 1 Rows of Principal Matrix (V) 2 1.1 Sprint-1 Dataset ................................ 2 1.2 Abilene Dataset.................................. 9 2 Set of Eigenflows 14 2.1 Sprint-1 Dataset.................................. 14 2.2 Abilene Dataset................................... 21 3 Classifying Eigenflows 26 3.1 Sprint-1 Dataset.................................. 26 3.2 Abilene Datase.................................... 44Centre National de la Recherche Scientifique (CNRS) France; Sprint Labs; Office of Naval Research (N000140310043); National Science Foundation (ANI-9986397, CCR-0325701

    On the geographic location of internet resources

    Get PDF

    Sampling Biases in IP Topology Measurements

    No full text
    Considerable attention has been focused on the properties of graphs derived from Internet measurements. Router-level topologies collected via traceroute-like methods have led some to conclude that the router graph of the Internet is well modeled as a power-law random graph. In such a graph, the degree distribution of nodes follows a distribution with a power-law tail

    Mining anomalies using traffic feature distributions

    No full text
    The increasing practicality of large-scale flow capture makes it possible to conceive of traffic analysis methods that detect and identify a large and diverse set of anomalies. However the challenge of effectively analyzing this massive data source for anomaly diagnosis is as yet unmet. We argue that the distributions of packet features (IP addresses and ports) observed in flow traces reveals both the presence and the structure of a wide range of anomalies. Using entropy as a summarization tool, we show that the analysis of feature distributions leads to significant advances on two fronts: (1) it enables highly sensitive detection of a wide range of anomalies, augmenting detections by volume-based methods, and (2) it enables automatic classification of anomalies via unsupervised learning. We show that using feature distributions, anomalies naturally fall into distinct and meaningful clusters. These clusters can be used to automatically classify anomalies and to uncover new anomaly types. We validate our claims on data from two backbone networks (Abilene and Geant) and conclude that feature distributions show promise as a key element of a fairly general network anomaly diagnosis framework

    Multivariate online anomaly detection using kernel recursive least squares

    No full text
    Abstract — High-speed backbones are regularly affected by various kinds of network anomalies, ranging from malicious attacks to harmless large data transfers. Different types of anomalies affect the network in different ways, and it is difficult to know a priori how a potential anomaly will exhibit itself in traffic statistics. In this paper we describe an online, sequential, anomaly detection algorithm, that is suitable for use with multivariate data. The proposed algorithm is based on the kernel version of the recursive least squares algorithm. It assumes no model for network traffic or anomalies, and constructs and adapts a dictionary of features that approximately spans the subspace of normal behaviour. The algorithm raises an alarm immediately upon encountering a deviation from the norm. Through comparison with existing block-based offline methods based upon Principal Component Analysis, we demonstrate that our online algorithm is equally effective but has much faster time-to-detection and lower computational complexity. We also explore minimum volume set approaches in identifying the region of normality. I

    Diagnosing Network-Wide Traffic Anomalies

    Get PDF
    Anomalies are unusual and significant changes in a network's traffic levels, which can often involve multiple links. Diagnosing anomalies is critical for both network operators and end users. It is a difficult problem because one must extract and interpret anomalous patterns from large amounts of high-dimensional, noisy data
    corecore